Chapter 01: Log-Linear Model

1. Introduction

Solomon-Wynne in 1953 conducted an experiment on Dogs. He wanted to understand, whether Dogs can learn from mistakes, so to speak. Specifically, They were interested in avoidance-learning. That is, when Dogs are given trauma-inducing shocks; Will they learn to avoid shocks in future?

We can state the objectives of the expeirment, according to our understanding, in more general terms as follows:

  1. Can the Dogs learn?

  2. Can they retain & recollect what they learnt?

The experimental setup, to drive the objectives, holds a dog in a closed compartment with steel flooring, open on one side with a small barrier for dog to jump over to the other side. A high-voltage electric shock is discharged into the steel floor intermittently to stimulate the dog. The dog is then left with an option to either get the shock for that trial or jump over the barrier to other side & save himself. Several dogs were recruited in the experiment.

The following picture is an illustration of the setup.

dog_setup

More details of the experiment can be found here.



In this chapter, we will analyze the experimental data using Bayesian Analysis, and the inference will be carried out in Pyro. The organization of the notebook is inspired from Bayesian Workflow by Prof. Andrew Gelman et al. Another piece of work in that direction is from Betancourt et al here. However, the current analysis is a WIP and far from perfect.

An almost always first step in Bayesian Analysis is to elicit a plausible generative model, that would have likely generated the observed data. In this case, consider the model suggesed/implemented in WinBUGs Vol1.

We want to model the relationship between avoidance-in-future and past-traumatic-experiences . The following log-linear model is a starting point:

\(\pi_j = A^{xj} B^{j-xj} \)

where :

  • \(\pi_j\) is the probability of a dog getting shocked at trial \(j\)

  • \(x_j\) is number of successful avoidances of shock prior to trial \(j\).

  • \(j-x_j\) is number of shocks experienced prior to trial \(j\).

  • A & B, both are unknown and treated as random variables.

However, the model is only partially complete. In a Bayesian setting, we need to elicit our prior beliefs about the unknowns. Consequently, we need to give priors to \(A\) and \(B\), which we do shortly. Before that, we need some boiler plate code, mostly imports. Note that, all the code (functions) are glued in the base class. If one is interested, they can always browse the code repo to get better understanding.

import torch
import pyro
import pandas as pd
import itertools
from collections import defaultdict
import matplotlib.pyplot as plt
import pyro.distributions as dist
import seaborn as sns
import plotly
import plotly.express as px
import plotly.figure_factory as ff
from chapter01 import base
import numpy as np

pyro.set_rng_seed(1)

plt.style.use('default')

%matplotlib inline
%load_ext autoreload

Data



The data contains experiments over 30 dogs, each dog is subjected to 25 trials.

The plot that follows highlights Dogs data in dictionary format, averaged over dog population for each trial.

write latex for summation of averaged over dog population for each trial.

dogs_data = base.load_data()
base.plot_original_y(1-np.mean(dogs_data["Y"], axis=0), ylabel='Probability of shock at trial j')

Its apparent from experimental data that more than half the dog population learns to avoid shocks in trials as few as 5, and also that the learning doesn’t significantly rise with increased number of trials.

Preprocessing



Following transforms target label Y to obtain input data x_avoidance, x_shocked & y where:

  • x_avoidance : number of shock avoidances before current trial.

  • x_shocked : number of shocks before current trial.

  • y: Status ‘shocked or avoided’ at current trial.

Here pystan format data (python dictionary) is passed to the function above, in order to preprocess it to tensor format required for pyro sampling.

x_avoidance, x_shocked, y = base.transform_data(**dogs_data)
print("x_avoidance: %s, x_shocked: %s, y: %s"%(x_avoidance.shape, x_shocked.shape, y.shape))
print("\nSample x_avoidance: %s \n\nSample x_shocked: %s"%(x_avoidance[1], x_shocked[1]))

base.plot_original_y(x_avoidance.numpy(), ylabel='Cumulative Avoidances')
base.plot_original_y(x_shocked.numpy(), ylabel='Cumulative Shocked Trials')
x_avoidance: torch.Size([30, 25]), x_shocked: torch.Size([30, 25]), y: torch.Size([30, 25])

Sample x_avoidance: tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
        1., 1., 1., 1., 2., 2., 3.]) 

Sample x_shocked: tensor([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  8.,  9., 10., 11., 12.,
        13., 14., 15., 16., 17., 18., 19., 20., 20., 21., 21.])

The original data is not revealing much, looking at the cumulative avoidances and shocks, we see that some dogs never learn (example: Dog 1-4), and some dogs learn and retain the learning behaviour (example: Dog 25-30).

2. Model Specification


The sampling distrution of the generative model, as indicated earlier, is:

\(y_{ij} \sim Bern(\pi_{ij})\)
\(\log(\pi_{ij}) = \alpha x_{ij} + \beta({j-x_{ij}})\)

Here, \(y_{ij}=1\) if the a dog fails to avoid a shock at the j-th trial, and is 0 if it avoids. We elicit normal priors for \(\alpha,\beta\) with zero mean and large variance (flat) to complete the specification. Also notice that, the above model is in a more familiar form (Generalized Linear Model or Log-Linear Model). For convenience, we can define \(X_{a}\equiv x_{ij}, X_{s}\equiv j-x_{ij} \)

The complete model is:

\(y_{ij} \sim Bern(\pi_{ij})\)
\(\log(\pi_{ij}) = \alpha X_{a} + \beta X_{s}\)

\(\pi_{ij} = e^{(\alpha X_{a} + \beta X_{s})}\)
\(\alpha \sim N(0., 316.)\)
\(\beta \sim N(0., 316.)\)

We experiment with the variants of the above models as follow:

Model A.
\(y_{ij} \sim Bern(\pi_{ij})\)
\(\pi_{ij} = e^{-(\alpha X_{a} + \beta X_{s})}\)

\(\alpha \sim N(0., 316.)\ I(\alpha>0)\)
\(\beta \sim N(0., 316.)\ I(\beta>0)\)

Model B.
\(y_{ij} \sim Bern(\pi_{ij})\)
\(\pi_{ij} = e^{-(\alpha X_{a} + \beta X_{s})}\)

\(\alpha \sim U(0, 10.)\)
\(\beta \sim U(0, 10.)\)

The above expression is used as a generalised linear model with log-link function in WinBugs implementation

BUGS model



\(\log\pi_j = \alpha\ x_j + \beta\ ( \)j\(-x_j )\)

Here

  • \(\log\pi_j\) is log probability of a dog getting shocked at trial \(j\)

  • \(x_j\) is number of successful avoidances of shock prior to trial \(j\).

  • \(j-x_j\) is number of shocks experienced prior to trial \(j\).

  • \(\alpha\) is the coefficient corresponding to number of successes, \(\beta\) is the coefficient corresponding to number of failures.

Following code block is from original BUGS volume:

    {
        for (i in 1 : Dogs) {
        xa[i, 1] <- 0; xs[i, 1] <- 0 p[i, 1] <- 0
        for (j in 2 : Trials) {
        xa[i, j] <- sum(Y[i, 1 : j - 1])
        xs[i, j] <- j - 1 - xa[i, j]
        log(p[i, j]) <- alpha * xa[i, j] + beta * xs[i, j]
        y[i, j] <- 1 - Y[i, j]
        y[i, j] ~ dbern(p[i, j])
       }
    }
    alpha ~ dnorm(0, 0.00001)I(, -0.00001)
    beta ~ dnorm(0, 0.00001)I(, -0.00001)
    A <- exp(alpha)
    B <- exp(beta)
    }

The same model when implemented in PyStan

Equivalent Stan model

{
    alpha ~ normal(0.0, 316.2);
    beta  ~ normal(0.0, 316.2);
    for(dog in 1:Ndogs)
        for (trial in 2:Ntrials)  
            y[dog, trial] ~ bernoulli(exp(alpha * xa[dog, trial] + beta * xs[dog, trial]));
      
}  

Model implementation

The above model is defined in base.DogsModel

DogsModel= base.DogsModel

DogsModel
<function chapter01.base.DogsModel(x_avoidance, x_shocked, y, alpha_prior=None, beta_prior=None, activation='exp')>
# def init_priors(prior_dict={"default":dist.Normal(0., 316.)}):
#     prior_dict["names"]= prior_dict.get("names") if prior_dict.get("names") else ["alpha", "beta"] 
# #     if not prior_dict.get("names"):
# #         raise Exception("pass 'parameters name as list' to key 'names', example: {'names': ['alpha', 'beta']}")
#     if not prior_dict.get("default"):
#         raise Exception("pass a deafault distribution to key 'default' for parameters ['alpha', 'beta']}")
#     prior_list = []
#     [prior_list.append(prior_dict.get(param, prior_dict.get("default"))) for param in prior_dict.get("names")];
#     return prior_list
# def get_prior_samples(alpha_prior, beta_prior, num_samples=1100):
#     priors_list= [(pyro.sample("alpha", alpha_prior).item(), 
#                pyro.sample("beta", beta_prior).item()) for index in range(num_samples)]# Picking 1100 prior samples
#     prior_samples = {"alpha":list(map(lambda prior_pair:prior_pair[0], priors_list)), 
#                      "beta":list(map(lambda prior_pair:prior_pair[1], priors_list))}

#     return prior_samples

Let us also draw few samples from the prior, and look at the distribution

num_samples = 1100
alpha_prior_a, beta_prior_a= base.init_priors({"default":dist.HalfNormal(316.)})# Pass priors for model 1A
prior_samples_a = base.get_prior_samples(alpha_prior_a, beta_prior_a, num_samples=num_samples)

alpha_prior_b, beta_prior_b= base.init_priors({"default":dist.Uniform(0.001, 10)})# Pass uniform priors for model 1B
prior_samples_b = base.get_prior_samples(alpha_prior_b, beta_prior_b, num_samples=num_samples)

Sampled output of prior values for alpha & beta is stored in prior_samples above, and is plotted on a KDE plot as follows:

# def plot_prior_distributions(**kwargs):
#     for model_name, prior_samples in kwargs.items():
#         print("For model '%s' Prior alpha Q(0.5) :%s | Prior beta Q(0.5) :%s"%(model_name, np.quantile(prior_samples["alpha"], 0.5), np.quantile(prior_samples["beta"], 0.5)))
#         fig = ff.create_distplot(list(prior_samples.values()), list(prior_samples.keys()))
#         fig.update_layout(title="Prior distribution of '%s' parameters"%(model_name), xaxis_title="parameter values", yaxis_title="density", legend_title="parameters")
#         fig.show()
        
# plot_prior_distributions(model_normal_1a= prior_samples_1a, model_uniform_1b= prior_samples_1b)
base.plot_prior_distributions(model_normal_1a= prior_samples_1a, model_uniform_1b= prior_samples_1b)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-67eded2a0c57> in <module>
----> 1 plot_prior_distributions(model_normal_1a= prior_samples_1a, model_uniform_1b= prior_samples_1b)
      2 # base.plot_prior_distributions(model_normal_1a= prior_samples_1a, model_uniform_1b= prior_samples_1b)

NameError: name 'plot_prior_distributions' is not defined

3. Prior predictive checking

# ??base.simulate_observations_given_param
import time
def simulate_observations_given_param(init_obs=None, num_dogs=1, num_trials=30, 
                                      parameter_pair_list= [(-2.490809, -1.642264)], activation = "exp", print_logs=0, print_exceptions=0):
    t1= time.time()
    simulated_data_given_pair = []
    activation_function = lambda name, value: {"sigmoid": 1/(1+np.exp(-value)), "exp":np.exp(-value)}.get(name)
    for alpha, beta in parameter_pair_list:
        simulated_y= []# Dogs level list
        try:
            # if (alpha>=20)|(beta>=20):
            #     raise ValueError("P_avoidance_ij will likely be Negative")
            for dog in range(num_dogs):
                Y_list = [0]# Trials level list, This holds P(Y=1), where Y is avoidance
                random_trial = init_obs if init_obs else np.random.randint(2)
                for trial in range(num_trials):
                    Y_list.append(random_trial)
                    test_data = {'Y': np.array([Y_list])}
                    test_data["Ndogs"]= len(test_data["Y"])#len(Y_list) 
                    test_data["Ntrials"]= len(Y_list)

                    test_x_avoided, test_x_shocked, _ = base.transform_data(**test_data)
                    if print_logs:
                        print("____\n\nFor Priors alpha: %s | beta: %s | trial: %s"%(alpha, beta, trial))
                        print("test_data: ", test_data)
                        print("trial number %s observations -- %s, %s"%(trial, test_x_avoided, test_x_shocked))

                    calculate_pij = lambda alpha, beta, test_x_avoided, test_x_shocked: activation_function(activation, alpha*np.array(test_x_avoided) + 
                                                                                    beta*np.array(test_x_shocked))

                    Pij_arr= calculate_pij(alpha, beta, test_x_avoided, test_x_shocked)# 𝜋𝑖𝑗=exp(𝛼𝑋𝑎+𝛽𝑋𝑠),𝜋𝑖𝑗 is P(getting shocked)
                    Pij= Pij_arr[0][-1]# Pij, P(getting shocked) for last trial

                    P_avoidance_ij = 1-Pij
#                     if P_avoidance_ij<0:
#                         break

                    obs_y= np.random.binomial(1, P_avoidance_ij)# obs_y =1 indicates avoidance as success
                    random_trial = obs_y

                    if print_logs:
                        print("Pij for trial number %s --  %s"%(trial, Pij))
                        print("Next observed trial: %s"%random_trial)

#                 if P_avoidance_ij<0:
#                     break
                simulated_y.append(Y_list)
#             if P_avoidance_ij<0:
#                 raise ValueError("P_avoidance_ij can't be Negative")                
            simulated_data_given_pair.append(simulated_y)
        except Exception as error:
            if print_exceptions:
                print("Issue '%s' with parameter pair: %s"%(error, [alpha, beta]))

    simulated_data_given_pair= np.array(simulated_data_given_pair)
    total_time= time.time()- t1
    print("Total execution time: %s\n"%total_time)

    return simulated_data_given_pair
def simulate_observations_given_prior_posterior_pairs(original_data, init_obs=None, num_dogs=30, num_trials=24, activation_type= "exp", prior_simulations= None, **kwargs):
    
    simulated_arr_list= []
    flag = "posterior" if prior_simulations is not None else "prior"
    note_text= "Here "
    for idx, content in enumerate(kwargs.items()):
        model_name, prior_samples = content
        print("___________\n\nFor model '%s' %s"%(model_name, flag))
        parameters_pairs = list(zip(*list(prior_samples.values())))
        print("total samples count:", len(parameters_pairs), " sample example: ", parameters_pairs[:2])

        simulated_data_given_pair= simulate_observations_given_param(init_obs=init_obs, num_dogs=num_dogs, num_trials=num_trials, 
                                                                     parameter_pair_list= parameters_pairs, activation=activation_type)#, print_logs=1
#         simulated_data_given_pair= base.simulate_observations_given_param(init_obs=init_obs, num_dogs=num_dogs, num_trials=num_trials, 
#                                                                      parameter_pair_list= parameters_pairs, activation=activation_type)#, print_logs=1
        print("Number of datasets/%s pairs generated: %s"%(flag, simulated_data_given_pair.shape[0]))
        
        simulated_data_given_pair_flattened = np.reshape(simulated_data_given_pair, (-1, 25))
        
        simulated_arr=np.mean(simulated_data_given_pair_flattened, axis=0)
        print("Shape of data simulated from %s for model '%s' : %s"%(flag, model_name, simulated_arr.shape))
        simulated_arr_list.append(simulated_arr.reshape((1, -1)))
        note_text+="'Dog_%s' corresponds to observations simulated from %s of '%s', "%(idx+1, flag, model_name)
    
    l_idx = idx+1#last index
    original_data_arr = np.mean(original_data, axis=0)# Append original dogs data

    if prior_simulations is not None:
        original_plus_simulated_data= np.concatenate(simulated_arr_list)
        original_plus_simulated_data= np.concatenate([original_plus_simulated_data, prior_simulations])
#         for idx in range(1, len(prior_simulations)+1):
        for idx, content in enumerate(kwargs.items()):
            model_name, _ = content
            note_text+="'Dog_%s' corresponds to observations simulated from prior of '%s', "%(l_idx+idx+1, model_name)
        ylabel_text= 'Probability of avoidance at trial j [original & simulated priors, posteriros]'
        l_idx+=idx+1
    else:
        simulated_arr_list.append(original_data_arr.reshape((1, -1)))
        original_plus_simulated_data= np.concatenate(simulated_arr_list)
        ylabel_text= 'Probability of avoidance at trial j [original & simulated priors]'
    print("Respective shape of original data: %s and concatenated arrays of data simulated for different priors/posteriors: %s"%(original_data_arr.shape, original_plus_simulated_data.shape))

    note_text = note_text[:-2]
    note_text+=" & 'Dog_%s' corresponds to 'Original data'."%(l_idx+1)
    print("\n%s\n%s\n%s"%("_"*55, "_"*70, note_text))
    
    base.plot_original_y(original_plus_simulated_data, ylabel=ylabel_text)

    return original_plus_simulated_data
    
# original_plus_simulated_data = simulate_observations_given_prior_posterior_pairs(dogs_data["Y"], num_dogs=2, num_trials=5, activation_type= "exp", model_normal_1a= prior_samples_1a, model_uniform_1b= prior_samples_1b)
# base.simulate_observations_given_prior_pairs(dogs_data["Y"], init_obs=0, num_dogs=2, num_trials=5, activation_type= "exp", model_normal_1a= prior_samples_1a, model_uniform_1b= prior_samples_1b)
# parameters_pairs = list(zip(*list(prior_samples_1a.values())))
# print("total samples count:", len(parameters_pairs), " sample example: ", parameters_pairs[:2])


# # simulated_data_given_pair= base.simulate_observations_given_param(init_obs=0, num_dogs=2, num_trials=5, parameter_pair_list= parameters_pairs)#, print_logs=1
# simulated_data_given_pair= simulate_observations_given_param(init_obs=0, num_dogs=2, num_trials=5, parameter_pair_list= parameters_pairs)#, print_logs=1
# print("Number of datasets/prior pairs generated: ", simulated_data_given_pair.shape[0])
total samples count: 1100  sample example:  [(52.97208786010742, 62.091880798339844), (101.03556060791016, 244.4811248779297)]
Total execution time: 0.6535332202911377

Number of datasets/prior pairs generated:  1100
# simulated_data_given_pair_flattened = np.reshape(simulated_data_given_pair, (-1, 25))

# arr1 = np.mean(dogs_data["Y"], axis=0)
# arr2=np.mean(simulated_data_given_pair_flattened, axis=0)
# original_plus_simulated_data= np.concatenate([arr1.reshape((1, -1)), arr2.reshape((1, -1))])

# print("respective shapes of original data: %s, data simulated from prior: %s and concatenated arrays: %s"%(arr1.shape, arr2.shape, original_plus_simulated_data.shape))
respective shapes of original data: (25,), data simulated from prior: (25,) and concatenated arrays: (2, 25)
original_plus_simulated_data = simulate_observations_given_prior_posterior_pairs(dogs_data["Y"], num_dogs=30, num_trials=24, activation_type= "exp", model_normal_1a= prior_samples_1a, model_uniform_1b= prior_samples_1b)

# original_plus_simulated_data = base.simulate_observations_given_prior_posterior_pairs(dogs_data["Y"], num_dogs=2, num_trials=5, activation_type= "exp", model_normal_1a= prior_samples_1a, model_uniform_1b= prior_samples_1b)
___________

For model 'model_normal_1a' prior
total samples count: 1100  sample example:  [(208.98727416992188, 84.3480224609375), (19.490013122558594, 196.33627319335938)]
Total execution time: 75.78852105140686

Number of datasets/prior pairs generated: 1100
Shape of data simulated from prior for model 'model_normal_1a' : (25,)
___________

For model 'model_uniform_1b' prior
total samples count: 1100  sample example:  [(5.8912200927734375, 0.18410980701446533), (7.265109539031982, 2.5937607288360596)]
Total execution time: 74.3368010520935

Number of datasets/prior pairs generated: 1100
Shape of data simulated from prior for model 'model_uniform_1b' : (25,)
Respective shape of original data: (25,) and concatenated arrays of data simulated for different priors/posteriors: (3, 25)

_______________________________________________________
______________________________________________________________________
Here 'Dog_1' corresponds to observations simulated from prior of 'model_normal_1a', 'Dog_2' corresponds to observations simulated from prior of 'model_uniform_1b' & 'Dog_3' corresponds to 'Original data'.
#Old 

# base.plot_original_y(original_plus_simulated_data, ylabel='Probability of avoidance at trial j [Both original & simulated prior data]')

# print("Here 'Dog 1' corresponds to Original data & 'Dog 2' corresponds to data simulated from prior")
Here 'Dog 1' corresponds to Original data & 'Dog 2' corresponds to data simulated from prior

Even though the mean and median can differ from run to run, looking at the densities, the variance is vary large – so the sample mean would also have huge variance. The take-away is that, both \(\alpha, \beta\) are given very weak priors. We more or less rely on the data (evidence) to drive their estimation.

TBD: Prior Sensitivity Analysis

4. Posterior Estimation

In the Bayesian setting, inference is drawn from the posterior. Here, posterior implies the updated beliefs about the random variables, in the wake of given evidences (data). Formally,

\(Posterior = \frac {Likelihood x Prior}{Probability \ of Evidence}\)

In our case, \(\alpha,\beta\) are the parameters (actually random variables) & \(y\) is the evidence; Posterior \(P(\alpha,\beta | y)\) is given, according to the Bayes rule, as:

\(P\ (\alpha,\beta | y) = \frac {P(y | \alpha,\beta) \pi(\alpha,\beta)}{P(y)}\)

Now our interest is in estimating the posterior summaries of the parameters \(\alpha, \beta\). For example, we can look at the posterior of mean of \(\alpha\), denoted as \(E(\alpha)\). However, in order to the get the posterior quanitities, either we need to compute the integrals or approximate the integrals via Markov Chain Monte Carlo.

The latter can be easily accomplished in Pyro by using the NUTS sampler – NUTS is a specific sampler designed to draw samples efficiently from the posterior using Hamiltonian Monte Carlo dynamics.

The following code snippet takes a pyro model object with posterior specification, input data, some configuration parameters such as a number of chains and number of samples per chain. It then laucnhes a NUTS sampler and produces MCMC samples in a python dictionary format.

# mean_variance = lambda a, b: [0.5*(a+b), (1/12)*(b-a)**2]

# lower_upper_bounds = lambda mean, var: [round((mean*2) - (np.sqrt(var*12) + 2*mean)/2, 3), round((np.sqrt(var*12) + 2*mean)/2, 3)]

# lower_upper_bounds(-5.00005, 8.33)

# mean_variance(-10, -0.0001)
[-5.00005, 8.333166667499999]
# # ??base.DogsModelUniformPrior

def DogsModel(x_avoidance, x_shocked, y, alpha_prior= None, beta_prior= None, activation= "exp"):
    """
    Input
    -------
    x_avoidance: tensor holding avoidance count for all dogs & all trials, example for 
                30 dogs & 25 trials, shaped (30, 25)
    x_shocked:   tensor holding shock count for all dogs & all trials, example for 30 dogs
                & 25 trials, shaped (30, 25).
    y:           tensor holding response for all dogs & trials, example for 30 dogs
                & 25 trials, shaped (30, 25).

    Output
    --------
    Implements model 1A, 1B, 2A, 2B
    
    Implements pystan model: {
            alpha ~ normal(0.0, 316.2);
            beta  ~ normal(0.0, 316.2);
            for(dog in 1:Ndogs)  
                for (trial in 2:Ntrials)  
                y[dog, trial] ~ bernoulli(exp(alpha * xa[dog, trial] + beta * xs[dog, trial]));}

    """
    beta_prior= beta_prior if beta_prior else alpha_prior
    
    activation_function = lambda name, value: {"sigmoid": 1/(1+torch.exp(-value)), "exp":torch.exp(-value)}.get(name)

#     sigmoid = lambda value: 1/(1+torch.exp(-value))
    alpha = pyro.sample("alpha", alpha_prior)#10
    beta = pyro.sample("beta", beta_prior)
    with pyro.plate("data"):
#         pyro.sample("obs", dist.Bernoulli(torch.exp(alpha*x_avoidance + beta * x_shocked)), obs=y)
        pyro.sample("obs", dist.Bernoulli(activation_function(activation, alpha*x_avoidance + beta * x_shocked)), obs=y)

import matplotlib.pyplot as plt
from functools import partial
from collections import defaultdict

from scipy import stats
import pyro.distributions as dist
from torch import nn
from pyro.nn import PyroModule
from pyro.infer import MCMC, NUTS    

def get_hmc_n_chains(pyromodel, xa, xs, y, num_chains=4, sample_count = 1000, 
                     burnin_percentage = 0.1, thining_percentage =0.9, 
                     alpha_prior= None, beta_prior= None, activation= "exp"):
    """
    Input
    -------
    pyromodel: Pyro model object with specific prior distribution
    xa: tensor holding avoidance count for all dogs & all trials
    xs: tensor holding shock count for all dogs & all trials
    y: tensor holding response observation for all dogs & all trials
    num_chains: Count of MCMC chains to launch, default 4
    sample_count: count of samples expected in a MCMC chains , default 1000
    burnin_percentage:
    thining_percentage: 

    Outputs
    ---------
    hmc_sample_chains: a dictionary with chain names as keys & dictionary of parameter vs sampled values list as values 
    hmc_chain_diagnostics: a dictionary with chain names as keys & dictionary of chain diagnostic metric values from hmc sampling. 

    """
    hmc_sample_chains =defaultdict(dict)
    hmc_chain_diagnostics =defaultdict(dict)

    net_sample_count= round(sample_count/((1- burnin_percentage)*(1- thining_percentage)))

    t1= time.time()
    for idx in range(num_chains):
        num_samples, burnin= net_sample_count, round(net_sample_count*burnin_percentage)
        nuts_kernel = NUTS(pyromodel)
        mcmc = MCMC(nuts_kernel, num_samples=num_samples, warmup_steps=burnin)
        mcmc.run(xa, xs, y, alpha_prior= alpha_prior, beta_prior= beta_prior, activation= activation)
        hmc_sample_chains['chain_{}'.format(idx)]={k: v.detach().cpu().numpy() for k, v in mcmc.get_samples().items()}
        hmc_chain_diagnostics['chain_{}'.format(idx)]= mcmc.diagnostics()

    print("\nTotal time: ", time.time()-t1)
    hmc_sample_chains= dict(hmc_sample_chains)
    hmc_chain_diagnostics= dict(hmc_chain_diagnostics)

    return hmc_sample_chains, hmc_chain_diagnostics
# ??DogsModel
# ??get_hmc_n_chains
??DogsModel
# DogsModel_1A alpha, beta ~ HalfNormal(316.) & activation "exp"

hmc_sample_chains_1a, hmc_chain_diagnostics_1a = get_hmc_n_chains(DogsModel, x_avoidance, x_shocked, y, num_chains=4, sample_count = 900, 
                         alpha_prior= dist.HalfNormal(316.), beta_prior= dist.HalfNormal(316.), activation= "exp")
Sample: 100%|██████████| 11000/11000 [00:52, 211.08it/s, step size=9.31e-01, acc. prob=0.909]
Sample: 100%|██████████| 11000/11000 [00:41, 262.20it/s, step size=9.47e-01, acc. prob=0.881]
Sample: 100%|██████████| 11000/11000 [00:54, 203.34it/s, step size=7.71e-01, acc. prob=0.915]
Sample: 100%|██████████| 11000/11000 [00:54, 203.53it/s, step size=8.01e-01, acc. prob=0.928]
Total time:  202.68139815330505
# DogsModel_1B alpha, beta ~ Uniform(0., 10.0) & activation "exp"

hmc_sample_chains_1b, hmc_chain_diagnostics_1b = get_hmc_n_chains(DogsModel, x_avoidance, x_shocked, y, num_chains=4, sample_count = 900, 
                         alpha_prior= dist.Uniform(0., 10.0), beta_prior= dist.Uniform(0., 10.0), activation= "exp")
Sample: 100%|██████████| 11000/11000 [00:50, 215.79it/s, step size=7.54e-01, acc. prob=0.927]
Sample: 100%|██████████| 11000/11000 [00:49, 220.23it/s, step size=7.89e-01, acc. prob=0.927]
Sample: 100%|██████████| 11000/11000 [00:49, 220.92it/s, step size=8.69e-01, acc. prob=0.916]
Sample: 100%|██████████| 11000/11000 [00:41, 267.35it/s, step size=9.71e-01, acc. prob=0.897]
Total time:  192.20268487930298

# hmc_sample_chains, hmc_chain_diagnostics = base.get_hmc_n_chains(DogsModel, x_avoidance, x_shocked, y, num_chains=4, sample_count = 900)
Sample: 100%|██████████| 11000/11000 [00:40, 271.10it/s, step size=7.21e-01, acc. prob=0.904]
Sample: 100%|██████████| 11000/11000 [00:45, 242.70it/s, step size=6.01e-01, acc. prob=0.932]
Sample: 100%|██████████| 11000/11000 [00:48, 226.87it/s, step size=6.05e-01, acc. prob=0.939]
Sample: 100%|██████████| 11000/11000 [00:45, 241.95it/s, step size=6.64e-01, acc. prob=0.930]
Total time:  180.49057388305664

hmc_sample_chains holds sampled MCMC values as {"Chain_0": {alpha [-0.20020795, -0.1829252, -0.18054989 . .,], "beta": {}. .,}, "Chain_1": {alpha [-0.20020795, -0.1829252, -0.18054989 . .,], "beta": {}. .,}. .}

5. Diagnosing the computational approximation

Just like any numerical technique, no matter how good the theory is or how robust the implementation is, it is always a good idea to check if indeed the samples drawn are reasonable. In the ideal situation, we expect the samples drawm by the sampler to be independant, and identically distributed (i.i.d) as the posterior distribution. In practice, this is far from true as MCMC itself is an approxmate technique and a lot can go wrong. In particular, chains may not have converged or samples are very correlated.

We can use both visual and more formal statistical techniques to inspect the quality of the fit (not the model fit to the data, but how well the appximation is, having accepted the model class for the data at hand) by treating chains as time-series data, and that we can run several chains in parallel. We precisely do that next.

Following snippet allows plotting Parameter vs. Chain matrix and optionally saving the dataframe.

beta_chain_matrix_df_1A = pd.DataFrame(hmc_sample_chains_1a)
beta_chain_matrix_df_1B = pd.DataFrame(hmc_sample_chains_1b)
# beta_chain_matrix_df = pd.DataFrame(hmc_sample_chains)
# beta_chain_matrix_df
chain_0 chain_1 chain_2 chain_3
alpha [-0.21752325, -0.20840329, -0.19865862, -0.207... [-0.22484542, -0.21680632, -0.20061515, -0.207... [-0.19112493, -0.19909424, -0.21799384, -0.221... [-0.20358093, -0.23399676, -0.21356916, -0.229...
beta [0.1437263, 0.19812588, 0.12988587, 0.18330136... [0.1587944, 0.15914914, 0.17720398, 0.16259481... [0.16716573, 0.15739073, 0.18902613, 0.1687972... [0.16975491, 0.19050196, 0.17819452, 0.1868397...
# Unpruned sample chains
base.save_parameter_chain_dataframe(beta_chain_matrix_df, "data/dogs_parameter_chain_matrix.csv")
Saved at 'data/dogs_parameter_chain_matrix.csv'

Sample chains mixing for Normal priors

Following plots chains of samples for alpha & beta parameters

print("For 'model_HalfNormal_1a'")    
base.plot_chains(beta_chain_matrix_df_1A)


print("\nFor 'model_Uniform_1b'")    
base.plot_chains(beta_chain_matrix_df_1B)
For 'model_HalfNormal_1a'
../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_41_1.png ../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_41_2.png
For 'model_Uniform_1b'
../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_41_4.png ../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_41_5.png
for chain, samples in hmc_sample_chains_1a.items():
    print("____\nFor model_HalfNormal_1a %s"%chain)
#     chain_list.extend(list(hmc_sample_chains_1b[chain].values()))
    
    samples= dict(map(lambda param: (param, torch.tensor(samples.get(param))), samples.keys()))# np array to tensors
    print(chain, "Sample count: ", len(samples["alpha"]))
    

    print("Alpha Q(0.5) :%s | Beta Q(0.5) :%s"%(torch.quantile(samples["alpha"], 0.5), torch.quantile(samples["beta"], 0.5)))
print("%s\n%s"%("_"*60, "_"*60))
for chain, samples in hmc_sample_chains_1b.items():
    print("____\nFor model_Uniform_1B %s"%chain)
#     chain_list.extend(list(hmc_sample_chains_1b[chain].values()))
    
    samples= dict(map(lambda param: (param, torch.tensor(samples.get(param))), samples.keys()))# np array to tensors
    print(chain, "Sample count: ", len(samples["alpha"]))
    

    print("Alpha Q(0.5) :%s | Beta Q(0.5) :%s"%(torch.quantile(samples["alpha"], 0.5), torch.quantile(samples["beta"], 0.5)))
____
For model_HalfNormal_1a chain_0
chain_0 Sample count:  10000
Alpha Q(0.5) :tensor(0.1936) | Beta Q(0.5) :tensor(0.0073)
____
For model_HalfNormal_1a chain_1
chain_1 Sample count:  10000
Alpha Q(0.5) :tensor(0.1938) | Beta Q(0.5) :tensor(0.0073)
____
For model_HalfNormal_1a chain_2
chain_2 Sample count:  10000
Alpha Q(0.5) :tensor(0.1937) | Beta Q(0.5) :tensor(0.0073)
____
For model_HalfNormal_1a chain_3
chain_3 Sample count:  10000
Alpha Q(0.5) :tensor(0.1935) | Beta Q(0.5) :tensor(0.0073)
____________________________________________________________
____________________________________________________________
____
For model_Uniform_1B chain_0
chain_0 Sample count:  10000
Alpha Q(0.5) :tensor(0.1938) | Beta Q(0.5) :tensor(0.0073)
____
For model_Uniform_1B chain_1
chain_1 Sample count:  10000
Alpha Q(0.5) :tensor(0.1935) | Beta Q(0.5) :tensor(0.0074)
____
For model_Uniform_1B chain_2
chain_2 Sample count:  10000
Alpha Q(0.5) :tensor(0.1934) | Beta Q(0.5) :tensor(0.0073)
____
For model_Uniform_1B chain_3
chain_3 Sample count:  10000
Alpha Q(0.5) :tensor(0.1936) | Beta Q(0.5) :tensor(0.0074)
# for chain, samples in pruned_hmc_sample_chains.items():
for chain in hmc_sample_chains_1a.keys():
    print("____\nFor %s"%chain)
    chain_list= list(hmc_sample_chains_1a[chain].values())
    chain_list.extend(list(hmc_sample_chains_1b[chain].values()))
    
#     samples= dict(map(lambda param: (param, torch.tensor(samples.get(param))), samples.keys()))# np array to tensors
#     print(chain, "Sample count: ", len(samples["alpha"]))
    title= "parameter distribution for : %s"%(chain)
    fig = ff.create_distplot(chain_list, ["alpha_1a", "beta_1a", "alpha_1b", "beta_1b"])
    fig.update_layout(title=title, xaxis_title="parameter values", yaxis_title="density", legend_title="parameters")
    fig.show()
    break

#     print("Alpha Q(0.5) :%s | Beta Q(0.5) :%s"%(torch.quantile(samples["alpha"], 0.5), torch.quantile(samples["beta"], 0.5)))
____
For chain_0

Auto-correlation plots for sample chains with Normal priors

base.autocorrelation_plots(beta_chain_matrix_df_1A)
base.autocorrelation_plots(beta_chain_matrix_df_1B)
# base.autocorrelation_plots(beta_chain_matrix_df)

For alpha, thining factor for chain_0 is 3, chain_1 is 3, chain_2 is 3, chain_3 is 3

for beta, thinin factor for chain_0 is 3, chain_1 is 3, chain_2 is 3, chain_3 is 3

thining_dict = {"chain_0": {"alpha":4, "beta":4}, "chain_1": {"alpha":4, "beta":4}, 
                "chain_2": {"alpha":4, "beta":4}, "chain_3": {"alpha":4, "beta":4}}


# thining_dict = {"chain_0": {"alpha":5, "beta":5}, "chain_1": {"alpha":5, "beta":5}, 
#                 "chain_2": {"alpha":5, "beta":5}, "chain_3": {"alpha":5, "beta":5}}
pruned_hmc_sample_chains_1a = base.prune_hmc_samples(hmc_sample_chains_1a, thining_dict)

pruned_hmc_sample_chains_1b = base.prune_hmc_samples(hmc_sample_chains_1b, thining_dict)
-------------------------
Original sample counts for 'chain_0' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_0' parameters: {'alpha': 4, 'beta': 4} 
Post thining sample counts for 'chain_0' parameters: {'alpha': (2500,), 'beta': (2500,)}


-------------------------
Original sample counts for 'chain_1' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_1' parameters: {'alpha': 4, 'beta': 4} 
Post thining sample counts for 'chain_1' parameters: {'alpha': (2500,), 'beta': (2500,)}


-------------------------
Original sample counts for 'chain_2' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_2' parameters: {'alpha': 4, 'beta': 4} 
Post thining sample counts for 'chain_2' parameters: {'alpha': (2500,), 'beta': (2500,)}


-------------------------
Original sample counts for 'chain_3' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_3' parameters: {'alpha': 4, 'beta': 4} 
Post thining sample counts for 'chain_3' parameters: {'alpha': (2500,), 'beta': (2500,)}


-------------------------
Original sample counts for 'chain_0' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_0' parameters: {'alpha': 4, 'beta': 4} 
Post thining sample counts for 'chain_0' parameters: {'alpha': (2500,), 'beta': (2500,)}


-------------------------
Original sample counts for 'chain_1' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_1' parameters: {'alpha': 4, 'beta': 4} 
Post thining sample counts for 'chain_1' parameters: {'alpha': (2500,), 'beta': (2500,)}


-------------------------
Original sample counts for 'chain_2' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_2' parameters: {'alpha': 4, 'beta': 4} 
Post thining sample counts for 'chain_2' parameters: {'alpha': (2500,), 'beta': (2500,)}


-------------------------
Original sample counts for 'chain_3' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_3' parameters: {'alpha': 4, 'beta': 4} 
Post thining sample counts for 'chain_3' parameters: {'alpha': (2500,), 'beta': (2500,)}

Gelman-Rubin statistic for Normal priors

grubin_values = base.gelman_rubin_stats(pruned_hmc_sample_chains_1a)

grubin_values = base.gelman_rubin_stats(pruned_hmc_sample_chains_1b)
Gelmen-rubin for 'param' alpha all chains is: 0.9996

Gelmen-rubin for 'param' beta all chains is: 0.9997

Gelmen-rubin for 'param' alpha all chains is: 0.9998

Gelmen-rubin for 'param' beta all chains is: 0.9997
# Following diagnostic table is for Un-pruned chains from sampler.
base.get_chain_diagnostics(hmc_chain_diagnostics)
acceptance rate divergences metric_values
parameters chain metric
beta chain_0 n_eff 0.9728 [] 4522.096191
r_hat 0.9728 [] 0.999932
chain_1 n_eff 0.9879 [] 3938.417236
r_hat 0.9879 [] 1.000102
chain_2 n_eff 0.9909 [] 3906.434570
r_hat 0.9909 [] 1.000002
chain_3 n_eff 0.9869 [] 3916.631348
r_hat 0.9869 [] 1.000059
alpha chain_0 n_eff 0.9728 [] 4602.600586
r_hat 0.9728 [] 0.999912
chain_1 n_eff 0.9879 [] 3553.121338
r_hat 0.9879 [] 1.000111
chain_2 n_eff 0.9909 [] 4174.396484
r_hat 0.9909 [] 0.999954
chain_3 n_eff 0.9869 [] 4088.593506
r_hat 0.9869 [] 0.999901

Defining sklearn sklearn logistic regression model for Pij

dogs_data["Y"]
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
        0, 1, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0],
       [0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0],
       [0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0,
        0, 0, 0],
       [0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1,
        0, 1, 1],
       [0, 0, 1, 1, 0, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0,
        0, 0, 0],
       [1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0,
        1, 1, 0],
       [1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 0, 1,
        0, 1, 0],
       [1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 0,
        1, 0, 1],
       [0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0,
        1, 0, 1],
       [1, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1,
        1, 1, 0],
       [1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 1,
        1, 0, 1],
       [1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0,
        0, 1, 1],
       [1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1,
        1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 0, 1],
       [1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1],
       [1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1]])
## Defining sklearn sklearn logistic regression model for Pij

from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

data_df= pd.DataFrame({"avoided":np.sum(dogs_data["Y"], axis=1), "shocks":dogs_data["Ntrials"]- np.sum(dogs_data["Y"], axis=1)})
data_df["Pij"]= np.round(np.mean(dogs_data["Y"], axis=1))

Xtrain= data_df[["avoided", "shocks"]].values
ytrain= data_df[["Pij"]].values
data_df.head(5)
avoided shocks Pij
0 0 25 0.0
1 3 22 0.0
2 2 23 0.0
3 2 23 0.0
4 6 19 0.0
clf = LogisticRegression(random_state=0).fit(Xtrain, ytrain)
clf.predict_proba(Xtrain[:2, :])
clf.score(Xtrain, ytrain)

clf.coef_# output alpha, beta
/Users/tachyon/miniconda3/envs/pytorch_x86/lib/python3.8/site-packages/sklearn/utils/validation.py:63: DataConversionWarning:

A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
array([[ 0.65993212, -0.65994749]])
??get_hmc_n_chains
# ??DogsModel

2A.
\(y_{ij} \sim Bern(\pi_{ij})\)
\(\pi_{ij} = S (\alpha X_{a} + \beta X_{s})\)

\(\alpha \sim N(0., 316.)\)
\(\beta \sim N(0., 316.)\)
\(S = S(x) \ = \frac{1}{1 - e^{-x}}\)

2B.
\(y_{ij} \sim Bern(\pi_{ij})\)
\(\log(\pi_{ij}) = -(\alpha X_{a} + \beta X_{s})\)

\(\alpha \sim U(-30.78, 30.78)\)
\(\beta \sim U(-30.78, 30.78)\)
\(S = S(x) \ = \frac{1}{1 - e^{-x}}\)

hmc_sample_chains_2a, hmc_chain_diagnostics_2a = get_hmc_n_chains(DogsModel, x_avoidance, x_shocked, y, num_chains=4, sample_count = 900, 
                         alpha_prior= dist.Normal(0., 316.), beta_prior= dist.Normal(0., 316.), activation= "sigmoid")
Sample: 100%|██████████| 11000/11000 [00:43, 252.59it/s, step size=6.32e-01, acc. prob=0.924]
Sample: 100%|██████████| 11000/11000 [00:50, 218.90it/s, step size=5.56e-01, acc. prob=0.945]
Sample: 100%|██████████| 11000/11000 [00:40, 274.35it/s, step size=7.85e-01, acc. prob=0.890]
Sample: 100%|██████████| 11000/11000 [00:39, 275.90it/s, step size=7.57e-01, acc. prob=0.900]
Total time:  174.32010316848755

hmc_sample_chains_2b, hmc_chain_diagnostics_2b = get_hmc_n_chains(DogsModel, x_avoidance, x_shocked, y, num_chains=4, sample_count = 900, 
                         alpha_prior= dist.Uniform(-30.78,30.78), beta_prior= dist.Normal(-30.78, 30.78), activation= "sigmoid")
Sample: 100%|██████████| 11000/11000 [02:48, 65.20it/s, step size=2.93e-01, acc. prob=0.909] 
Sample: 100%|██████████| 11000/11000 [02:48, 65.27it/s, step size=2.47e-01, acc. prob=0.938] 
Sample: 100%|██████████| 11000/11000 [01:48, 101.72it/s, step size=2.26e-01, acc. prob=0.797]
Sample: 100%|██████████| 11000/11000 [01:22, 133.90it/s, step size=2.49e-01, acc. prob=0.941]
Total time:  527.9356551170349

beta_chain_matrix_df_2A = pd.DataFrame(hmc_sample_chains_2a)
beta_chain_matrix_df_2B = pd.DataFrame(hmc_sample_chains_2b)
print("For 'model_normal_2a'")    
base.plot_chains(beta_chain_matrix_df_2A)


print("\nFor 'model_uniform_2b'")    
base.plot_chains(beta_chain_matrix_df_2B)
For 'model_normal_2a'
../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_63_1.png ../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_63_2.png
For 'model_uniform_2b'
../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_63_4.png ../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_63_5.png
for chain, samples in hmc_sample_chains_2a.items():
    print("____\nFor model_normal_2A %s"%chain)
#     chain_list.extend(list(hmc_sample_chains_1b[chain].values()))
    
    samples= dict(map(lambda param: (param, torch.tensor(samples.get(param))), samples.keys()))# np array to tensors
    print(chain, "Sample count: ", len(samples["alpha"]))
    

    print("Alpha Q(0.5) :%s | Beta Q(0.5) :%s"%(torch.quantile(samples["alpha"], 0.5), torch.quantile(samples["beta"], 0.5)))
print("%s\n%s"%("_"*60, "_"*60))
for chain, samples in hmc_sample_chains_2b.items():
    print("____\nFor model_uniform_2B %s"%chain)
#     chain_list.extend(list(hmc_sample_chains_1b[chain].values()))
    
    samples= dict(map(lambda param: (param, torch.tensor(samples.get(param))), samples.keys()))# np array to tensors
    print(chain, "Sample count: ", len(samples["alpha"]))
    

    print("Alpha Q(0.5) :%s | Beta Q(0.5) :%s"%(torch.quantile(samples["alpha"], 0.5), torch.quantile(samples["beta"], 0.5)))
____
For model_normal_2A chain_0
chain_0 Sample count:  10000
Alpha Q(0.5) :tensor(-0.2153) | Beta Q(0.5) :tensor(0.1786)
____
For model_normal_2A chain_1
chain_1 Sample count:  10000
Alpha Q(0.5) :tensor(-0.2155) | Beta Q(0.5) :tensor(0.1785)
____
For model_normal_2A chain_2
chain_2 Sample count:  10000
Alpha Q(0.5) :tensor(-0.2152) | Beta Q(0.5) :tensor(0.1787)
____
For model_normal_2A chain_3
chain_3 Sample count:  10000
Alpha Q(0.5) :tensor(-0.2152) | Beta Q(0.5) :tensor(0.1790)
____________________________________________________________
____________________________________________________________
____
For model_uniform_2B chain_0
chain_0 Sample count:  10000
Alpha Q(0.5) :tensor(-0.2154) | Beta Q(0.5) :tensor(0.1793)
____
For model_uniform_2B chain_1
chain_1 Sample count:  10000
Alpha Q(0.5) :tensor(-0.2155) | Beta Q(0.5) :tensor(0.1795)
____
For model_uniform_2B chain_2
chain_2 Sample count:  10000
Alpha Q(0.5) :tensor(23.7836) | Beta Q(0.5) :tensor(0.3075)
____
For model_uniform_2B chain_3
chain_3 Sample count:  10000
Alpha Q(0.5) :tensor(-0.2162) | Beta Q(0.5) :tensor(0.1798)
for chain in hmc_sample_chains_2a.keys():
    print("____\nFor chain%s"%chain)
    chain_list= list(hmc_sample_chains_2a[chain].values())
    chain_list.extend(list(hmc_sample_chains_2b[chain].values()))
    
#     samples= dict(map(lambda param: (param, torch.tensor(samples.get(param))), samples.keys()))# np array to tensors
#     print(chain, "Sample count: ", len(samples["alpha"]))
    title= "parameter distribution for : %s"%(chain)
    fig = ff.create_distplot(chain_list, ["alpha_2a", "beta_2a", "alpha_2b", "beta_2b"])
    fig.update_layout(title=title, xaxis_title="parameter values", yaxis_title="density", legend_title="parameters")
    fig.show()
    break

#     print("Alpha Q(0.5) :%s | Beta Q(0.5) :%s"%(torch.quantile(samples["alpha"], 0.5), torch.quantile(samples["beta"], 0.5)))
____
For chainchain_0
# ??simulate_observations_given_prior_posterior_pairs
posterior_parameters_pairs_2a = hmc_sample_chains_2a.get('chain_0')

posterior_parameters_pairs_2b = hmc_sample_chains_2b.get('chain_0')

_ = simulate_observations_given_prior_posterior_pairs(dogs_data["Y"], num_dogs=30, num_trials=24, activation_type= "sigmoid", model_normal_2a= posterior_parameters_pairs_2a, model_uniform_2b= posterior_parameters_pairs_2b)

Based on simple multiple line plots, we can see that, in this run, chain_3 is behaving differently than the remaining chains. It may be due to really a different initialization. Otherwise, all chains seem to mix well. Therefore, we drop chain_3 from analysis. However, we need to be cautious about dropping, and we should check what is its effect on the actual predictions – since sometimes, even though parameters can look very different numerically, they may have very little effect on the likelihood. Neverthless, it implies that either something is not right about the chain or the model is operating at the edge.

Descriptive summaries

Following outputs the summary of required statistics such as "mean", "std", "Q(0.25)", "Q(0.50)", "Q(0.75)", select names of statistic metric from given list to view values

#chain results Pruned after ACF plots

beta_chain_matrix_df = pd.DataFrame(pruned_hmc_sample_chains)

base.summary(beta_chain_matrix_df)
Select any value

We can also report the 5-point Summary Statistics (mean, Q1-Q4, Std, ) as tabular data per chain and save the dataframe

fit_df_1a = pd.DataFrame()
for chain, values in hmc_sample_chains_1a.items():
# for chain, values in hmc_sample_chains.items():
    param_df = pd.DataFrame(values)
    param_df["chain"]= chain
    fit_df_1a= pd.concat([fit_df_1a, param_df], axis=0)

# fit_df.to_csv("data/dogs_classification_hmc_samples.csv", index=False)
fit_df_1b = pd.DataFrame()
for chain, values in hmc_sample_chains_1b.items():
# for chain, values in hmc_sample_chains.items():
    param_df = pd.DataFrame(values)
    param_df["chain"]= chain
    fit_df_1b= pd.concat([fit_df_1b, param_df], axis=0)

# fit_df.to_csv("data/dogs_classification_hmc_samples.csv", index=False)
# fit_df = pd.DataFrame()
# for chain, values in pruned_hmc_sample_chains.items():
# # for chain, values in hmc_sample_chains.items():
#     param_df = pd.DataFrame(values)
#     param_df["chain"]= chain
#     fit_df= pd.concat([fit_df, param_df], axis=0)

# # fit_df.to_csv("data/dogs_classification_hmc_samples.csv", index=False)
# fit_df.head(3)
alpha beta chain
0 -0.217523 0.143726 chain_0
1 -0.216006 0.186150 chain_0
2 -0.247033 0.160901 chain_0
base.save_parameter_chain_dataframe(fit_df, "data/dogs_classification_hmc_samples.csv")
Saved at 'data/dogs_classification_hmc_samples.csv'
# Use following to load data once the results from pyro sampling operation are saved offline
load_button= base.build_upload_button()
# Use following to load data once the results from pyro sampling operation are 
# saved offline
if load_button.value:
    fit_df = base.load_parameter_chain_dataframe(load_button)
fit_df.head(3)
alpha beta chain
0 -0.217523 0.143726 chain_0
1 -0.216006 0.186150 chain_0
2 -0.247033 0.160901 chain_0

Following outputs the similar summary of required statistics such as "mean", "std", "Q(0.25)", "Q(0.50)", "Q(0.75)", but in a slightly different format, given a list of statistic names

base.summary(fit_df, layout =2)
Select any value

Following plots sampled parameters values as Boxplots with M parameters side by side on x axis for each of the N chains

Pass the list of M parameters and list of N chains, with plot_interactive as True or False to choose between Plotly or Seaborn

parameters= ["alpha", "beta"]# All parameters for given model
chains= fit_df["chain"].unique()# Number of chains sampled for given model
# Use plot_interactive=False for Normal seaborn plots offline

base.plot_parameters_for_n_chains(fit_df, chains=['chain_0', 'chain_1', 'chain_2', 'chain_3'], parameters=parameters, plot_interactive=True)

Following plots the joint distribution of pair of each parameter sampled values for all chains

parameters= ["alpha", "beta"]# All parameters for given model
chains= fit_df_1a["chain"].unique()# Number of chains sampled for given model

base.plot_joint_distribution(fit_df_1b, parameters)
Pyro -- alpha Vs. beta
../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_85_1.png
base.plot_joint_distribution(fit_df_1a, parameters)
Pyro -- alpha Vs. beta
../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_86_1.png
# base.plot_joint_distribution(fit_df, parameters)
# # # 
Pyro -- alpha Vs. beta
../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_87_1.png

Following plots the Pairplot distribution of each parameter with every other parameter’s sampled values

sns.pairplot(data=fit_df, hue= "chain");
../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_89_0.png

Based on all above summaries both visual and descriptive, chain_2 seemed problematics, and it is very clear that \(\alpha, \beta <0\) with almost certainly.

TBD: Converence Statsitics like Gelman-Rubin has to be implemented.

6. Sensitivity Analysis

Posterior Predictive Checking (PPE) helps examine the fit of a model to real data, as the parameter drawn for simulating conditions & regions of interests come from the posterior distribution. While PPE incorporates model uncertainly (by averaring over all possible models), we take a simpler route to begin with, which is to, sample the \(\alpha, \beta\) pair that is very plausible in the posterior (eg. the poster means), and simulate data from under this particular generative model.

def simulate_observations_given_param(init_obs=None, num_dogs=1, num_trials=30, 
                                      parameter_pair_list= [(-2.490809, -1.642264)], activation = "exp", print_logs=0, print_exceptions=0):
    t1= time.time()
    simulated_data_given_pair = []
    activation_function = lambda name, value: {"sigmoid": 1/(1+np.exp(-value)), "exp":np.exp(-value)}.get(name)
    for alpha, beta in parameter_pair_list:
        simulated_y= []# Dogs level list
        try:
            # if (alpha>=20)|(beta>=20):
            #     raise ValueError("P_avoidance_ij will likely be Negative")
            for dog in range(num_dogs):
                Y_list = [0]# Trials level list, This holds P(Y=1), where Y is avoidance
                random_trial = init_obs if init_obs else np.random.randint(2)
                for trial in range(num_trials):
                    Y_list.append(random_trial)
                    test_data = {'Y': np.array([Y_list])}
                    test_data["Ndogs"]= len(test_data["Y"])#len(Y_list) 
                    test_data["Ntrials"]= len(Y_list)

                    test_x_avoided, test_x_shocked, _ = base.transform_data(**test_data)
                    if print_logs:
                        print("____\n\nFor Priors alpha: %s | beta: %s | trial: %s"%(alpha, beta, trial))
                        print("test_data: ", test_data)
                        print("trial number %s observations -- %s, %s"%(trial, test_x_avoided, test_x_shocked))

                    calculate_pij = lambda alpha, beta, test_x_avoided, test_x_shocked: activation_function(activation, alpha*np.array(test_x_avoided) + 
                                                                                    beta*np.array(test_x_shocked))

                    Pij_arr= calculate_pij(alpha, beta, test_x_avoided, test_x_shocked)# 𝜋𝑖𝑗=exp(𝛼𝑋𝑎+𝛽𝑋𝑠),𝜋𝑖𝑗 is P(getting shocked)
                    Pij= Pij_arr[0][-1]# Pij, P(getting shocked) for last trial

                    P_avoidance_ij = 1-Pij
#                     if P_avoidance_ij<0:
#                         break

                    obs_y= np.random.binomial(1, P_avoidance_ij)# obs_y =1 indicates avoidance as success
                    random_trial = obs_y

                    if print_logs:
                        print("Pij for trial number %s --  %s"%(trial, Pij))
                        print("Next observed trial: %s"%random_trial)

#                 if P_avoidance_ij<0:
#                     break
                simulated_y.append(Y_list)
#             if P_avoidance_ij<0:
#                 raise ValueError("P_avoidance_ij can't be Negative")                
            simulated_data_given_pair.append(simulated_y)
        except Exception as error:
            if print_exceptions:
                print("Issue '%s' with parameter pair: %s"%(error, [alpha, beta]))

    simulated_data_given_pair= np.array(simulated_data_given_pair)
    total_time= time.time()- t1
    print("Total execution time: %s\n"%total_time)
def simulate_observations_given_prior_posterior_pairs(original_data, init_obs=None, num_dogs=30, num_trials=24, activation_type= "exp", prior_simulations= None, **kwargs):
    
    simulated_arr_list= []
    flag = "posterior" if prior_simulations is not None else "prior"
    note_text= "Here "
    for idx, content in enumerate(kwargs.items()):
        model_name, prior_samples = content
        print("___________\n\nFor model '%s' %s"%(model_name, flag))
        parameters_pairs = list(zip(*list(prior_samples.values())))
        print("total samples count:", len(parameters_pairs), " sample example: ", parameters_pairs[:2])

        simulated_data_given_pair= simulate_observations_given_param(init_obs=init_obs, num_dogs=num_dogs, num_trials=num_trials, 
                                                                     parameter_pair_list= parameters_pairs, activation=activation_type)#, print_logs=1
#         simulated_data_given_pair= base.simulate_observations_given_param(init_obs=init_obs, num_dogs=num_dogs, num_trials=num_trials, 
#                                                                      parameter_pair_list= parameters_pairs, activation=activation_type)#, print_logs=1
        print("Number of datasets/%s pairs generated: %s"%(flag, simulated_data_given_pair.shape[0]))
        
        simulated_data_given_pair_flattened = np.reshape(simulated_data_given_pair, (-1, 25))
        
        simulated_arr=np.mean(simulated_data_given_pair_flattened, axis=0)
        print("Shape of data simulated from %s for model '%s' : %s"%(flag, model_name, simulated_arr.shape))
        simulated_arr_list.append(simulated_arr.reshape((1, -1)))
        note_text+="'Dog_%s' corresponds to observations simulated from %s of '%s', "%(idx+1, flag, model_name)
    
    l_idx = idx+1#last index
    original_data_arr = np.mean(original_data, axis=0)# Append original dogs data

    if prior_simulations is not None:
        original_plus_simulated_data= np.concatenate(simulated_arr_list)
        original_plus_simulated_data= np.concatenate([original_plus_simulated_data, prior_simulations])
#         for idx in range(1, len(prior_simulations)+1):
        for idx, content in enumerate(kwargs.items()):
            model_name, _ = content
            note_text+="'Dog_%s' corresponds to observations simulated from prior of '%s', "%(l_idx+idx+1, model_name)
        ylabel_text= 'Probability of avoidance at trial j [original & simulated priors, posteriros]'
        l_idx+=idx+1
    else:
        simulated_arr_list.append(original_data_arr.reshape((1, -1)))
        original_plus_simulated_data= np.concatenate(simulated_arr_list)
        ylabel_text= 'Probability of avoidance at trial j [original & simulated priors]'
    print("Respective shape of original data: %s and concatenated arrays of data simulated for different priors/posteriors: %s"%(original_data_arr.shape, original_plus_simulated_data.shape))

    note_text = note_text[:-2]
    note_text+=" & 'Dog_%s' corresponds to 'Original data'."%(l_idx+1)
    print("\n%s\n%s\n%s"%("_"*55, "_"*70, note_text))
    
    base.plot_original_y(original_plus_simulated_data, ylabel=ylabel_text)

    return original_plus_simulated_data
    

# original_plus_simulated_data = simulate_observations_given_prior_posterior_pairs(dogs_data["Y"], num_dogs=2, num_trials=5, activation_type= "exp", model_normal_1a= prior_samples_1a, model_uniform_1b= prior_samples_1b)
# base.simulate_observations_given_prior_pairs(dogs_data["Y"], init_obs=0, num_dogs=2, num_trials=5, activation_type= "exp", model_normal_1a= prior_samples_1a, model_uniform_1b= prior_samples_1b)
posterior_parameters_pairs_1a = hmc_sample_chains_1a.get('chain_0')

posterior_parameters_pairs_1b = hmc_sample_chains_1b.get('chain_0')

original_plus_simulated_data_posterior_ = simulate_observations_given_prior_posterior_pairs(1-dogs_data["Y"], 
                                                                                           num_dogs=2, num_trials=5, activation_type= "exp", prior_simulations=original_plus_simulated_data, model_normal_1a= posterior_parameters_pairs_1a, model_uniform_1b= posterior_parameters_pairs_1b)
___________

For model 'model_normal_1a' posterior
total samples count: 10000  sample example:  [(0.20342435, 0.0077009345), (0.1890759, 0.009231471)]
Total execution time: 6.548594951629639

Number of datasets/posterior pairs generated: 10000
Shape of data simulated from posterior for model 'model_normal_1a' : (25,)
___________

For model 'model_uniform_1b' posterior
total samples count: 10000  sample example:  [(0.2120226, 0.0045217406), (0.18117085, 0.0058743265)]
Total execution time: 5.893326044082642

Number of datasets/posterior pairs generated: 10000
Shape of data simulated from posterior for model 'model_uniform_1b' : (25,)
Respective shape of original data: (25,) and concatenated arrays of data simulated for different priors/posteriors: (5, 25)

_______________________________________________________
______________________________________________________________________
Here 'Dog_1' corresponds to observations simulated from posterior of 'model_normal_1a', 'Dog_2' corresponds to observations simulated from posterior of 'model_uniform_1b', 'Dog_3' corresponds to observations simulated from prior of 'model_normal_1a', 'Dog_4' corresponds to observations simulated from prior of 'model_uniform_1b' & 'Dog_5' corresponds to 'Original data'.
# # With priors & posteriros
# posterior_parameters_pairs_1a = hmc_sample_chains_1a.get('chain_0')

# posterior_parameters_pairs_1b = hmc_sample_chains_1b.get('chain_0')

# original_plus_simulated_data_posterior = simulate_observations_given_prior_posterior_pairs(dogs_data["Y"], 
#                                                                                            num_dogs=30, num_trials=24, activation_type= "sigmoid", prior_simulations=original_plus_simulated_data, model_normal_1a= posterior_parameters_pairs_1a, model_uniform_1b= posterior_parameters_pairs_1b)
___________

For model 'model_normal_1a' posterior
total samples count: 10000  sample example:  [(0.20342435, 0.0077009345), (0.1890759, 0.009231471)]
Total execution time: 20952.46205687523

Number of datasets/posterior pairs generated: 10000
Shape of data simulated from posterior for model 'model_normal_1a' : (25,)
___________

For model 'model_uniform_1b' posterior
total samples count: 10000  sample example:  [(0.2120226, 0.0045217406), (0.18117085, 0.0058743265)]
Total execution time: 2155.898931980133

Number of datasets/posterior pairs generated: 10000
Shape of data simulated from posterior for model 'model_uniform_1b' : (25,)
Respective shape of original data: (25,) and concatenated arrays of data simulated for different priors/posteriors: (5, 25)

_______________________________________________________
______________________________________________________________________
Here 'Dog_1' corresponds to observations simulated from posterior of 'model_normal_1a', 'Dog_2' corresponds to observations simulated from posterior of 'model_uniform_1b', 'Dog_3' corresponds to observations simulated from prior of 'model_normal_1a', 'Dog_4' corresponds to observations simulated from prior of 'model_uniform_1b' & 'Dog_5' corresponds to 'Original data'.
# posterior_parameters_pairs_1a = hmc_sample_chains_2a.get('chain_0')

# posterior_parameters_pairs_1b = hmc_sample_chains_2b.get('chain_0')

# original_plus_simulated_data_posterior = simulate_observations_given_prior_posterior_pairs(dogs_data["Y"], 
#                                                                                            num_dogs=30, num_trials=24, activation_type= "sigmoid", prior_simulations=original_plus_simulated_data, model_normal_1a= posterior_parameters_pairs_1a, model_uniform_1b= posterior_parameters_pairs_1b)
# posterior_parameters_pairs_c1= pruned_hmc_sample_chains.get('chain_0')

# posterior_parameters_pairs_c1 = list(zip(*list(posterior_parameters_pairs_c1.values())))
# print("total posterior samples count in chain:", len(posterior_parameters_pairs_c1), " sample example: ", posterior_parameters_pairs_c1[:2])
total posterior samples count in chain: 2500  sample example:  [(-0.21752325, 0.1437263), (-0.21600601, 0.18614991)]
# # simulated_data_given_pair_posterior= simulate_observations_given_param(num_dogs=30, num_trials=24, parameter_pair_list= posterior_parameters_pairs_c1)#, print_logs=1
# simulated_data_given_pair_posterior= base.simulate_observations_given_param(init_obs=0, num_dogs=30, num_trials=24, parameter_pair_list= posterior_parameters_pairs_c1)#, print_logs=1
# print("Number of datasets/posterior pairs generated: ", simulated_data_given_pair_posterior.shape[0])
Total execution time: 170.5323257446289

Number of datasets/posterior pairs generated:  2500
# simulated_data_given_pair_posterior_flattened = np.reshape(simulated_data_given_pair_posterior, (-1, 25))

# arr1 = np.mean(dogs_data["Y"], axis=0)# Original
# arr2=np.mean(simulated_data_given_pair_flattened, axis=0)# Priors
# arr3=np.mean(simulated_data_given_pair_posterior_flattened, axis=0)# Posterior

# original_plus_simulated_data_plus_posterior= np.concatenate([arr1.reshape((1, -1)), arr2.reshape((1, -1)), arr3.reshape((1, -1))])

# print("respective shapes of original data: %s, data simulated from prior: %s, data simulated from posterior: %s and concatenated arrays: %s"%(arr1.shape, arr2.shape, arr3.shape, original_plus_simulated_data_plus_posterior.shape))
respective shapes of original data: (25,), data simulated from prior: (25,), data simulated from posterior: (25,) and concatenated arrays: (3, 25)
# base.plot_original_y(original_plus_simulated_data_plus_posterior, ylabel='Probability of avoidance at trial j [all original & simulated prior data]')

# print("\n\nHere 'Dog 1' corresponds to Original data, 'Dog 2' corresponds to data simulated from prior & 'Dog 3' corresponds to data simulated from posterior")
Here 'Dog 1' corresponds to Original data, 'Dog 2' corresponds to data simulated from prior & 'Dog 3' corresponds to data simulated from posterior
# base.plot_original_y(original_plus_simulated_data_plus_posterior, ylabel='Probability of avoidance at trial j [all original & simulated prior data]')

# print("\n\nHere 'Dog 1' corresponds to Original data, 'Dog 2' corresponds to data simulated from prior & 'Dog 3' corresponds to data simulated from posterior")
Here 'Dog 1' corresponds to Original data, 'Dog 2' corresponds to data simulated from prior & 'Dog 3' corresponds to data simulated from posterior

7. Model Comparison

More often than not, there may be many plausible models that can explain the data. Sometime, the modeling choice is based on domain knowledge. Sometime it is out of comptational conveninece. Latter is the case with the choice of priors. One way to consider different models is by eliciting different prior distributions.

As long as the sampling distribtion is same, one can use Deviance Information Criterion (DIC) to guide model comparison.

Deviance Information Criterion

DIC is computed as follows

\(D(\alpha,\beta) = -2\ \sum_{i=1}^{n} \log P\ (y_{i}\ /\ \alpha,\beta)\)

\(\log P\ (y_{i}\ /\ \alpha,\beta)\) is the log likehood of shocks/avoidances observed given parameter \(\alpha,\beta\), this expression expands as follows:

\[D(\alpha,\beta) = -2\ \sum_{i=1}^{30}[ y_{i}\ (\alpha Xa_{i}\ +\beta\ Xs_{i}) + \ (1-y_{i})\log\ (1\ -\ e^{(\alpha Xa_{i}\ +\beta\ Xs_{i})})]\]

Using $D(\alpha,\beta)$ to Compute DIC

\(\overline D(\alpha,\beta) = \frac{1}{T} \sum_{t=1}^{T} D(\alpha,\beta)\)

\(\overline \alpha = \frac{1}{T} \sum_{t=1}^{T}\alpha_{t}\\\) \(\overline \beta = \frac{1}{T} \sum_{t=1}^{T}\beta_{t}\)

\(D(\overline\alpha,\overline\beta) = -2\ \sum_{i=1}^{30}[ y_{i}\ (\overline\alpha Xa_{i}\ +\overline\beta\ Xs_{i}) + \ (1-y_{i})\log\ (1\ -\ e^{(\overline\alpha Xa_{i}\ +\overline\beta\ Xs_{i})})]\)


Therefore finally

\(DIC\ =\ 2\ \overline D(\alpha,\beta)\ -\ D(\overline\alpha,\overline\beta)\)




Following method computes deviance value given parameters `alpha & beta`
#launch docstring for calculate_deviance_given_param

#launch docstring for calculate_mean_deviance

Following method computes deviance information criterion for a given bayesian model & chains of sampled parameters alpha & beta


#launch docstring for DIC

#launch docstring for compare_DICs_given_model

Define alternate model with different priors

The following model is defined in the same manner using Pyro as per the following expression of generative model for this dataset, just with modification of prior distribution to Uniform rather than Normal as follows:

Instead of considering Normal priors of \(\alpha\) and \(\beta\), we consider uniform priors, i.e., \(prior\ \alpha\) ~ \(U(-30.79, 30.79)\), \(\beta\) ~ \(U(-30.79, 30.79)\)

# # Dogs model with uniform prior
#launch docstring for DogsModelUniformPrior

DogsModelUniformPrior= base.DogsModelUniformPrior
DogsModelUniformPrior
<function chapter01.base.DogsModelUniformPrior(x_avoidance, x_shocked, y)>
# # ??base.DogsModelUniformPrior

# def DogsModelUniformPrior(x_avoidance, x_shocked, y):
#     """
#     Input
#     -------
#     x_avoidance: tensor holding avoidance count for all dogs & all trials, example for 
#                 30 dogs & 25 trials, shaped (30, 25)
#     x_shocked:   tensor holding shock count for all dogs & all trials, example for 30 dogs
#                 & 25 trials, shaped (30, 25).
#     y:           tensor holding response for all dogs & trials, example for 30 dogs
#                 & 25 trials, shaped (30, 25).

#     Output
#     --------
#     Implements pystan model: {
#             alpha ~ uniform(-30.79, 30.79);
#             beta  ~ uniform(-30.79, 30.79);
#             for(dog in 1:Ndogs)  
#                 for (trial in 2:Ntrials)  
#                 y[dog, trial] ~ bernoulli(exp(alpha * xa[dog, trial] + beta * xs[dog, trial]));}

#     """
#     sigmoid = lambda value: 1/(1+torch.exp(-value))
#     alpha = pyro.sample("alpha", dist.Uniform(-10, -0.0001))
#     beta = pyro.sample("beta", dist.Uniform(-10, -0.0001))
#     with pyro.plate("data"):
#         pyro.sample("obs", dist.Bernoulli(sigmoid(alpha*x_avoidance + beta * x_shocked)), obs=y)
hmc_sample_chains_uniform_prior, hmc_sample_chains_uniform_diagnostics= base.get_hmc_n_chains(DogsModelUniformPrior, x_avoidance, x_shocked, y, num_chains=4, sample_count= 900)
Sample: 100%|██████████| 11000/11000 [00:54, 202.04it/s, step size=7.44e-01, acc. prob=0.903]
Sample: 100%|██████████| 11000/11000 [00:56, 195.65it/s, step size=7.17e-01, acc. prob=0.902]
Sample: 100%|██████████| 11000/11000 [04:14, 43.26it/s, step size=7.33e-01, acc. prob=0.880] 
Sample: 100%|██████████| 11000/11000 [01:04, 171.18it/s, step size=5.38e-01, acc. prob=0.952]
Total time:  429.8291800022125

Sample chains mixing for Uniform priors

Following plots chains of samples for alpha & beta parameters with uniform priors

beta_chain_matrix_uniform_df = pd.DataFrame(hmc_sample_chains_uniform_prior)

base.plot_chains(beta_chain_matrix_uniform_df)
../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_115_0.png ../../_images/Ch01_dogs_log_linear_Pyro_DISCARDED_115_1.png

Auto-correlation plots for sample chains with Uniform priors

base.autocorrelation_plots(beta_chain_matrix_uniform_df)
thining_dict_uniform = {"chain_0": {"alpha":3, "beta":3}, "chain_1": {"alpha":3, "beta":3}, 
                "chain_2": {"alpha":3, "beta":3}, "chain_3": {"alpha":3, "beta":3}}


pruned_hmc_sample_chains_uniform_prior = base.prune_hmc_samples(hmc_sample_chains_uniform_prior, thining_dict_uniform)
-------------------------
Original sample counts for 'chain_0' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_0' parameters: {'alpha': 3, 'beta': 3} 
Post thining sample counts for 'chain_0' parameters: {'alpha': (3333,), 'beta': (3333,)}


-------------------------
Original sample counts for 'chain_1' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_1' parameters: {'alpha': 3, 'beta': 3} 
Post thining sample counts for 'chain_1' parameters: {'alpha': (3333,), 'beta': (3333,)}


-------------------------
Original sample counts for 'chain_2' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_2' parameters: {'alpha': 3, 'beta': 3} 
Post thining sample counts for 'chain_2' parameters: {'alpha': (3333,), 'beta': (3333,)}


-------------------------
Original sample counts for 'chain_3' parameters: {'alpha': (10000,), 'beta': (10000,)}

Thining factors for 'chain_3' parameters: {'alpha': 3, 'beta': 3} 
Post thining sample counts for 'chain_3' parameters: {'alpha': (3333,), 'beta': (3333,)}

Gelman-Rubin statistic for Uniform priors

grubin_values_uniform = base.gelman_rubin_stats(pruned_hmc_sample_chains_uniform_prior)
Gelmen-rubin for 'param' alpha all chains is: 0.9998

Gelmen-rubin for 'param' beta all chains is: 0.9997

compute & compare deviance information criterion for a multiple bayesian models

base.compare_DICs_given_model(x_avoidance, x_shocked, y, Dogs_normal_prior= pruned_hmc_sample_chains, Dogs_uniform_prior= pruned_hmc_sample_chains_uniform_prior)
______________________________

For model : Dogs_normal_prior
. . .DIC for chain_0: 472.8
. . .DIC for chain_1: 472.894
. . .DIC for chain_2: 472.914
. . .DIC for chain_3: 472.935

. .Mean Deviance information criterion for all chains: 472.886

______________________________

For model : Dogs_uniform_prior
. . .DIC for chain_0: 568.974
. . .DIC for chain_1: 568.883
. . .DIC for chain_2: 568.773
. . .DIC for chain_3: 568.799

. .Mean Deviance information criterion for all chains: 568.857

The DIC values are very close, so we don’t anticipate substantially different fits. This is largely because, both priors are flat. However, if were to follow the rule book, we had to pick a model with the smallest DIC. In that case, we have to pick Normal Priors over Uniform Priors.

8. Inference & Analysis

Alright, we have a model, and we are reasonable sure about the fit (both numerical and conceptual), but so what? The purpose of model building is to use these models as probing devices. That is, using the models can we answer some questions about the reality that these models have abstracted.

We choose model with Normal Prior, and pick samples from one particular chain of HMC samples say chain_3

for chain, samples in pruned_hmc_sample_chains.items():
    samples= dict(map(lambda param: (param, torch.tensor(samples.get(param))), samples.keys()))# np array to tensors
    print(chain, "Sample count: ", len(samples["alpha"]))
chain_0 Sample count:  2500
chain_1 Sample count:  2500
chain_2 Sample count:  2500
chain_3 Sample count:  2500

Plot density for parameters from chain_3 to visualise the spread of sample values from that chain

title= "parameter distribution for : %s"%(chain)
fig = ff.create_distplot(list(map(lambda x:x.numpy(), samples.values())), list(samples.keys()))
fig.update_layout(title=title, xaxis_title="parameter values", yaxis_title="density", legend_title="parameters")
fig.show()

print("Alpha Q(0.5) :%s | Beta Q(0.5) :%s"%(torch.quantile(samples["alpha"], 0.5), torch.quantile(samples["beta"], 0.5)))
Alpha Q(0.5) :tensor(-0.2148) | Beta Q(0.5) :tensor(0.1786)

Plot density & contours for both parameters from chain_3 to visualise the joint distribution & region of interest

#Choosing samples from chain 3
chain_samples_df= fit_df[fit_df["chain"]==chain].copy()# chain is 'chain_3' 

alpha= chain_samples_df["alpha"].tolist()
beta= chain_samples_df["beta"].tolist()
colorscale = ['#7A4579', '#D56073', 'rgb(236,158,105)', (1, 1, 0.2), (0.98,0.98,0.98)]
fig = ff.create_2d_density(alpha, beta, colorscale=colorscale, hist_color='rgb(255, 255, 150)', point_size=4, title= "alpha beta joint density plot")
fig.update_layout( xaxis_title="x (alpha)", yaxis_title="y (beta)")

fig.show()

Note: The distribution of alpha values are significantly offset to the left from beta values, by almost 13 times; Thus for any given input observation of avoidances or shocks, the likelihood of getting shocked is more influenced by small measure of avoidance than by getting shocked.

Observations:

On observing the joint distribution of \(\alpha, \beta\), we note that \(\beta > \alpha\) and \(\beta\) is closer to zero. Here, \(\beta\) can be interpreted as learning ability, i.e., the ability of a dog to learn from shock experiences. The increase in number of shocks barely raises the probability of non-avoidance (value of 𝜋𝑗) with little amount. Unless the trials & shocks increase considerably large in progression, it doesn’t mellow down well and mostly stays around 0.9.

However, it is not the case with \(\alpha, \alpha\) is more negative & farthest from ‘zero’. It imparts a significant decline in non-avoidance (𝜋𝑗) even for few instances where dog avoids the shock; therefore \(\alpha\) can be interpreted as retention ability i.e., the ability to retain the learning from previous shock experiences.

print(chain_samples_df["alpha"].describe(),"\n\n", chain_samples_df["beta"].describe())
count    2500.000000
mean       -0.215691
std         0.016598
min        -0.280845
25%        -0.226861
50%        -0.214832
75%        -0.204219
max        -0.162974
Name: alpha, dtype: float64 

 count    2500.000000
mean        0.179158
std         0.020163
min         0.121527
25%         0.164853
50%         0.178601
75%         0.192828
max         0.245223
Name: beta, dtype: float64

From the contour plot above following region in posterior distribution seems highly plausible for parameters:

  1. For alpha, -0.2 < alpha < -0.19

  2. For beta -0.0075 < beta < -0.0055

Let us look at \(\frac{\alpha}{\beta}\) as a proxy to see which of the two (learning ability and _retention_ability) are domimant.

We are using \(\frac{\alpha}{\beta}\) as a probing device to answer that question, and similar quantities can be defined. With MCMC samples available, we can get posterior probabilties of any function of the model parameters (here \(\alpha, \beta\). Say, we can be interested in the \(E(\frac{\alpha}{\beta})\) or \(P(\frac{\alpha}{\beta}<1)\).

The latter quantity can be estimate by the Monte Carlo average as follows: \(P(\frac{\alpha}{\beta}>1) = \frac{1}{n}\sum_{t=1}^{n} I(\alpha < \beta)\), i.e, the fraction of times \(\alpha < \beta\).

x1 = chain_samples_df["alpha"].to_numpy()
x2 = chain_samples_df["beta"].to_numpy()
p = np.mean(x1<x2)
print(p)
1.0

So, the posterior evident for retention ability outweigting learning abilty is overwhelming.

9. Commentary

.

TBD.